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When a large body of data from diverse experiments is analyzed using a theoretical 
model with many parameters, the standard error matrix method and the general tools for 
evaluating errors may become inadequate. We present an iterative method that significantly 
improves the reliability of the error matrix calculation. To obtain even better estimates of 
the uncertainties on predictions of physical observables, we also present a Lagrange multi- 
plier method that explores the entire parameter space and avoids the linear approximations 
assumed in conventional error propagation calculations. These methods are illustrated by 
an example from the global analysis of parton distribution functions. 



1 Introduction 



The subject of this paper is a problem that arises when a large body of data from di- 
verse experiments is analyzed according to a theoretical model that has many adjustable 
parameters. Consider a generic data fitting problem based on experimental measurements 
{Dj, 1 = 1,..., N} with errors {(Ji}. The data are to be compared to predictions {T/} from 
a theoretical model with unknown parameters {oj, i = 1, . . . ,n}. A common technique for 
comparing data with theory is to compute the function defined by 



or a generalization of that formula if correlations between the errors are known in terms of 
a set of correlation matrices. The physics objectives are (i) to find the best estimate of the 
parameters {oj} and their uncertainties, and (ii) to predict the values and uncertainties of 
physical quantities {X^°'\ a = 1,2, . . .} that are functions of the {flj}. 

If the errors are randomly distributed, and the correlations well determined, then stan- 
dard statistical methods of minimization ^ apply, and established fitting tools like 
the CERN Library program MINUIT can be employed. However, real problems are often 
more complex. This is particularly true in a "global analysis," where the large number of 
data points {-D/} do not come from a uniform set of measurements, but instead consist of 
a collection of results from many experiments on a variety of physical processes, with di- 
verse characteristics and errors. The difficulties are compounded if there are unquantified 
theoretical uncertainties, if the number of theoretical parameters n is large, or if the best 
parametrization cannot be uniquely defined a priori. All of these difficulties arise in the 
global analysis of hadronic parton distribution functions (PDFs) §,0, which originally 
motivated this investigation. Several groups have addressed the question of estimating errors 
for the PDF determinations |, ^, 0. But the problem is clearly more general than that 
application. 

Of the many issues that confront a global analysis, we address in this paper two, for 
which we have been able to significantly improve on the traditional treatment. The improve- 
ments allow a more reliable determination of the uncertainties of {oj} and {X*^"^} in complex 
systems for which conventional methods may fail. To define these problems, we assume the 
system can be described by a global fitting function Xgiobai; for short, that characterizes 
the goodness-of-fit for a given set of theory parameters {aj}. This distills all available 
information on the theory and on the global experimental data sets, including their errors 
and correlations. One finds the minimum value Xo of X^y ^ind the best estimate of the theory 
parameters are the values {a°} that produce that minimum. The dependence of on {oj} 
near the minimum provides information on the uncertainties in the {aj}. These are usually 
characterized by the error matrix and its inverse, the Hessian matrix Hij, where one assumes 
that can be approximated by a quadratic expansion in {aj} around {a^}. Once the Hes- 
sian is known, one can estimate not only the uncertainties of {flj}, but also the uncertainty 
in the theoretical prediction for any physical quantity X, provided the dependence of X on 
{ai} can be approximated by a hnear expansion around {a°}, and is thus characterized by 
its gradient at {a°} (cf. Sec. ^. 
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The first problem we address is a technical one that is important in practice: If the 
uncertainties are very disparate for different directions in the n-dimensional parameter space 
{tti}, i.e., if the eigenvalues of Hij span many orders of magnitude, how can one calculate the 
matrix Hij with sufficient accuracy that reliable predictions are obtained for all directions? 
To solve this problem, we have developed an iterative procedure that adapts the step sizes 
used in the numerical calculation of the Hessian to the uncertainties in each eigenvector 
direction. We demonstrate the effectiveness of this procedure in our specific application, 
where the standard tool fails to yield reliable results. 

The second problem we address concerns the reliability of estimating the uncertainty 
AX in the prediction for some physical variable X that is a function of the {aj}: How 
can one estimate AX in a way that takes into account the variation of over the entire 
parameter space {flj}, without assuming the quadratic approximation to ^^^^ the linear 
approximation to X that are a part of the error matrix approach? We solve this problem 
by using Lagrange's method of the undetermined multiplier to make constrained fits that 
derive the dependence of on X. Because this method is more robust, it can be used by 
itself or to check the reliability of the Hessian method. 

Section 2 summarizes the error matrix formalism and establishes our notation. Section 
3 describes the iterative method for calculating the Hessian, and demonstrates its superiority 
in a concrete example. Section 4 introduces the Lagrange multiplier method and compares 
its results with the Hessian approach to the same application. Section 5 concludes. 

2 Error Matrix and Hessian 

First we review the well-known connection between the error matrix and the Hessian matrix 
of second derivatives. We emphasize the eigenvector representations of those matrices, which 
are used extensively later in the paper. 

The basic assumption of the error matrix approach is that can be approximated by 
a quadratic expansion in the fit parameters {oj} near the global minimum. This assumption 
will be true if the variation of the theory values Tj with {oj} is approximately linear near 
the minimum. Defining the displacement of parameter Oj from its value a° 

at the minimum, we have 



where the derivatives are evaluated at the minimum point i/i = and Hij are the elements 
of the Hessian matrix^ There are no linear terms in yi in (|^), because the first derivatives 
of are zero at the minimum. 

Being a symmetric matrix, H^j has a complete set of n orthonormal eigenvectors v}^'^ = 

^We include a factor 1/2 in the definition of H, as is the custom in high energy physics. 




(2) 



(3) 
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Vik with eigenvalues : 

^HijVjk = GkVik (4) 
j 

^VijVik = Sjk. (5) 

i 

These eigenvectors provide a natural basis to express arbitrary variations around the mini- 
mum: we replace {i/i} by a new set of parameters {zi} defined by 



These parameters have the simple property that 

i 

In other words, the surfaces of constant spheres in {zi} space, with Ax^ the squared 

distance from the minimum. 

The orthonormality of Vij can be used to invert the transformation (^: 

j 

The Hessian and its inverse, which is the error matrix, are easily expressed in terms of the 
eigenvalues and eigenvector components: 

Hij = ^ekVikVjk (9) 

k 

{H'^)ij = ^ — VikVjk. (10) 
k 

Now consider any physical quantity X that can be calculated according to the theory 
as a function of the parameters {oj}. The best estimate of X is the value at the minimum 
Xq = X(a°). In the neighborhood of the minimum, assuming the first term of the Taylor- 
series expansion of X gives an adequate approximation, the deviation of X from its best 
estimate is given by 

dX 

AX = x-Xo = Y,-Q-m= JZ^.^i (11) 

i i 

where 

are the components of the 2;-gradient evaluated at the global minimum, i.e., at the origin in 
2;-space. 

Since increases uniformly in all directions in 2;-space, the gradient vector Xi gives 
the direction in which the physical observable X varies fastest with increasing x"^- The 
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maximum deviation in X for a given increase in is therefore obtained by the dot product 
of the gradient vector Xj and a displacement vector Zi in the same direction with length 



^/Ax^, i.e., Zi = Xi^J Ax'^/'^jXj. For the square of the deviation, we therefore obtain the 
simpler formula 

{AXf = {X.Zr = Ax'J2^'- (13) 

i 

The traditional formula for the error estimate (AX)^ in terms of the original coordinates 
{Hi} can be derived by substituting Xj = |^ = ^^|^ ^ in and using (|) and ([T0|). 

j 

The result is 

This standard result can of course also be derived directly by minimizing in with 
respect to {flj}, subject to a constraint on X. 

Equations (|T3|) and (|T^ are equivalent if the assumptions of a linear approximation for 



X and a quadratic approximation for are exact. But in practice, the numerical accuracy 
of the two can differ considerably if these conditions are not well met over the relevant 
region of parameter space. To calculate the error estimate AX, we prefer to use Eq. ( pTSD 
using derivatives Xj calculated by finite differences of X at the points Zi = ±^^y Ax"^ (with 
Zj = for j 7^ i). This is generally more accurate, because it estimates the necessary 
derivatives using an appropriate step size, and thus reduces the effect of higher order terms 
and numerical noise. 

In a complex problem such as a global analysis, the region of applicability of the ap- 
proximations is generally unknown beforehand. A situation of particular concern is when the 
various eigenvalues {ej} have very different orders of magnitude — signaling that the function 
varies slowly in some directions of space, and rapidly in others. The iterative method 
described in the next section is designed to deal effectively with this situation. 



3 Iterative Procedure 

In practical applications, the Hessian matrix Hij is calculated using finite differences to es- 
timate the second derivatives in (|^). A balance must be maintained in choosing the step 
sizes for this, since higher-order terms will contribute if the intervals are too large, while 
numerical noise will dominate if the intervals are too small. This noise problem may arise 
more often than is generally realized, since the theory values {T/} that enter the calcula- 
tion may not be the ideally smooth functions of the fit parameters that one would associate 
with analytic formulas. For in complex theoretical models, the {Tj} may be computed from 
multiple integrals that have small discontinuities as functions of {oj} induced by adaptive 
integration methods. These numerical errors forbid the use of a very small step size in the 
finite difference calculations of derivatives. Furthermore, as noted above, the eigenvalues of 
Hij may span a wide range, so excellent accuracy is needed especially to get the smaller ones 
right. 
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The Procedure 



We want to evaluate Hij by sampling the values of x in a region of parameter space where 
Eq. d^) is a good approximation. In principle, the parameters {zi} are the natural choice for 
exploring this space; but of course they are not known in advance. We therefore adopt the 
following iterative procedure: 

1. Define a new set of coordinates {^i} by 

yi = ^Uijtj^j (15) 

j 

where Uij is an orthogonal matrix and {ti} are scale factors. In the first iteration, 
these are chosen as Uij = 6ij and = 1, so that C,i = Vi- This makes the first round 
of iteration similar to the usual procedure of taking derivatives with respect to a, . 
The iterative method is designed such that with successive iterations, Uij, t,, and C,i 
converge to Vij, A/l/cj, and Zi respectively. 

2. Calculate the effective second derivative matrix $jj defined by 

= XO + Y.'^^J^^^J (16) 

1 d^x^ 

" 2 9pe? ^^^^ 

using finite differences of the ^i. The step size in C,i is chosen to make the increase in 
due to the diagonal element ^a^f equal to a certain value • The choice of is 
determined by the particular physics application at hand. Naively, one might expect 
Sx"^ ^ 1 to be the right choice. That would indeed be appropriate for a function 
obeying ideal statistical requirements. But when the input to Xgiobai imperfect, a 
reasonable choice of must be based on a physics judgement of the appropriate 
range of that particular function. We therefore leave the choice of Sx"^ open in this 
general discussion.^] In any case, if the final results are to be trustworthy, they must 
not be sensitive to that choice. 

We calculate each off-diagonal second derivative by evaluating at the four corners 
of the rectangle (— 5i, —Sj), Sj), (— 5i, where 6i is the step size. 

This is a modification of the technique used in MINUIT P]. For the sake of efficiency, 
the MINUIT subroutine HESSE estimates off-diagonal elements using only one of those 
corners, together with values at (5j,0) and {0,5j) that are already known from calcu- 
lating the diagonal elements of the Hessian. Our method is slower by a factor of 4, 
but is more accurate because it fully or partly cancels some of the contributions from 
higher derivatives. The first derivatives dx'^/d^i are also calculated at this stage of the 
iteration and used to refine the estimate of the location of the minimum. 



^Cf. discussion in the following subsection on a sample problem. 
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3. Compute the Hessian according to 



4. Find the normahzed eigenvectors of the Hessian, as defined by Eqs. (0) and (^. 

5. Replace Uij by Vij, tj by ■>/ 1 / e j , and go back to step 1. The steps are repeated typically 
10-20 times, until the changes become small and converges to 6ij. 

This iterative procedure improves the estimate of the Hessian matrix, and hence of the 
error matrix, because in the later iterations it calculates the Hessian based on points that 
sample the region where Ax^ has the magnitude of physical interest. 



Results from a Sample Application 

As an example, we apply the iterative procedure to the application that motivated this 
study — the global analysis of PDFs 0] — and compare the results with those obtained from 
MINUIT. The experimental input for this problem consists of A= 1295 data points from 15 
different experimental data sets involving four distinct physical processes. All the potential 
complexities mentioned earlier are present in this system. The theory is the quark parton 
model, based on next-to-leading order perturbative Quantum Chromodynamics (QCD). The 
model contains n = 16 parameters that characterize the quark and gluon distributions 
in the proton at some low momentum scale Qq. From a calculational point of view, the 
theoretical model consists of the numerical integration of an integro-differential equation 
and multiple convolution integrals that are evaluated mostly by adaptive algorithms. The 
fitting function Xgiobai ^^i^ combines the published statistical and systematic errors 
of the data points in quadrature. The only correlated errors incorporated are the published 
overall normalization uncertainties of the individual experiments. The fitting program is 
the same as that used to generate the CTEQ parton distributions [^, |^. The global 
minimum for this system defines the CTEQ5M1 set of PDFs, for which Xo ~ 1200 0. 
We find that the eigenvalues {ej} of the Hessian for this system range over 5-6 orders of 
magnitude (distributed approximately exponentially). 

The value of Ax^ that corresponds to a given confidence level is well defined for an 
ideal experiment: e.g., Ax"^ < 1 defines the 68% confidence region. But in a real-world 
global analysis, the experimental and theoretical values in Eq. (|l]) include systematic errors, 
and the uncertainties u/ include subjective estimates of those errors, so the relation between 
Ax^ and confidence level requires further analysis. From independent detailed studies of 
the uncertainties |]TT], |T2|, we estimate that an appropriate choice of for the iterative 
calculation is around 10 in our application, and only the region Ax^ > 100 can be ruled out 
for the final fits. 

The error matrix approach relies on a quadratic approximation to in the neighborhood 
of the minimum. To test whether that approximation is valid, we plot as a function of 
distance along a particular direction in {aj} space, as shown in Fig. |I[ The direction chosen 
is a typical one — specifically it is the direction of the eigenvector with median eigenvalue. 
The dotted curve in Fig. |T] is the exact x^ and the solid curve is the quadratic approximation 
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(0). The approximation is seen to provide a rather good description of the function. Even at 
points where has increased by 50, the quadratic approximation reproduces the increase 
to 20% accuracy. 

1^601 " . I .... I .... I .... I .... I M 




-5.0 -2.5 0.0 2.5 5.0 
Distance 

Figure 1: Variation of with distance along a typical direction in parameter space. The 
dotted curve is the exact and the solid curve is the quadratic approximation based on 
the Hessian. The quadratic form is seen to be a rather good approximation over the range 
shown. 

To correctly measure the curvature of the quadratic approximation, it is important to 
fit points that are displaced by an appropriate distance. This can be seen from Fig. ^ which 
shows the difference between the two curves in Fig. |l| in the central region. The difference 
displays a small cubic contribution to x^. It also reveals contributions that vary erratically 
with a magnitude on the order of 0.03 . These fluctuations come from the noise associated 
with switching of intervals in the adaptive integration routines. Because the fluctuations 
are small, they do not affect our results in principle. But they do require care in estimating 
the derivatives. In particular, they would make flnite-difference estimates based on small 
intervals extremely unreliable. The iterative method avoids this problem by choosing a 
suitable scale for each eigenvector direction when evaluating the Hessian. 

Figures |1] and ^ show the behavior of along a single typical direction in the 16 
dimensional parameter space. Fig. |^ shows a complementary test of the iterative method for 
all possible directions. We have chosen 1000 directions at random in {zi\ space. We displace 
the parameters away from the minimum in each of these directions by a distance that makes 
Ax^ = 5. We then compute the value of Ax^ predicted by the quadratic approximation 
using the Hessian calculated by the iterative method and, for comparison, by the routine 
HESSE within the MINUIT package. The results are displayed in Fig. |^ as histograms, with 
Ax^ on the horizontal axis and the number of counts on the vertical axis. If x^ were quadratic 
in {oj}, then a perfect computational method would yield a delta function at Ax^ = 5. Fig. 
^ shows that: 
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3-2-10 1 2 3 

Distance 

Figure 2: Difference between and its quadratic approximation both of which are 
shown in Fig. |l|. A cubic contribution can be seen, along with a noticeable amount of 
numerical noise. The fine structure revealed here is small compared to the main variation of 
itself, which rises by 20 over the region shown, as can be seen in Fig. 1. 

• For the solid histogram — the result of the iterative procedure — the quadratic approxima- 
tion is close to the exact result in all directions, and hence Eq. (0) is a pretty good 
representation of x^- Quantitatively, the middle 68% of the distribution is contained in 
the region 5.4 ± 0.6 . 

• For the dotted histogram — based on the general purpose program MINUIT — the distribution 
is also spread around the expected value of 5, but it is very broadly distributed. This 
estimate of the Hessian is therefore unsatisfactory, because we might be interested in a 
quantity whose gradient direction is one for which the Hessian computed by MINUIT is 
widely off the mark. A major source of this problem is the numerical noise visible in 
Fig. ^ MINUIT uses a small step size to calculate the derivatives, and gets misled by the 
small-scale discontinuities in x^- For some directions, Ax^ even becomes negative because 
the errors in one or more of the small eigenvalues are big enough to allow their calculated 
values to become negative. (Within MINUIT, this circumstance elicits a warning message, 
and a constant is added to all the eigenvalues, which in the context of Fig. ^ corresponds 
to shifting the dotted distribution to the right.) 

Figure | shows the results of a similar study, in which the 1000 random directions are 
chosen only from the subspace of {zi} that is spanned by the 10 directions with the largest 
eigenvalues e,. The larger eigenvalues correspond to directions in which rises most rapidly, 
or in other words, directions in which the parameters are more strongly constrained by data. 
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Figure 3: Frequency distribution of Ax^ according to the Hessian approximation (|2D for 
displacements in random directions for which the true value is Ax^ = 5.0 . Solid histogram: 
using Hessian calculated by iterative method of Section 3; Dotted histogram: using Hessian 
calculated by MINUIT. 



Because the distance moved away from the minimum in {aj} space is smaller in this case, 
the quadratic approximation is generally better, so it is not surprising that the histograms 
are more sharply peaked than in Fig. ^. But the advantage of the iterative method remains 
apparent. 



Comment 

Information from the iteratively-improved Hessian provides a useful tool for refining the 
choice of functional forms used to parametrize a continuous degree of freedom in the theo- 
retical model. Explicitly, the relevant formulas are as follows. 

The length squared of the displacement vector in the space of fit parameters is 

E(«^-«°)' = E^^' = E- (19) 

I It 

while Ax^ = by (J^). Hence the directions in which the parameters are well determined 

(the steep directions) correspond to eigenvectors of the Hessian with large eigenvalues, while 
the shallow directions in which they are weakly determined correspond to small eigenvalues. 
The extreme values for any particular are 

ai = a\ ± Aai (20) 

where 
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Figure 4: Same as Fig. except that the displacements are restricted to the parameter 
subspace spanned by the 10 steepest directions. 



Equation (|2T|) can be used to see if each parameter is appropriately well constrained. Fur- 
thermore, the individual terms in the sum show the contributions to Aa^ from the various 
eigenvectors, so if a parametrization leads to a poorly defined minimum because it allows too 
much freedom — which is indicated by a failure of the iteration to converge for the smallest 
eigenvalues of the Hessian — it is easy to see which of the parameters are most responsible 
for the too-shallow directions. 



4 Lagrange Multiplier Method 

The Hessian, via its inverse which is the error matrix, provides a general way to propagate 
the uncertainties of experimental and theoretical input to the fit parameters {aj}, and thence 
on to a measurable quantity X({aj}) by Eqs. ( jlB]) or (JT^). But these equations are based on 
assuming that and X can be treated as quadratic and linear functions of respectively. 
In this section we describe a different approach, based on the mathematical method of the 
Lagrange undetermined multiplier, which avoids those assumptions. 

The Procedure 

Let Xq be the value of X at the minimum, which is the best estimate of X. For a fixed 
value of A, called the Lagrange multiplier, one performs a new minimization with respect to 
the fit parameters {aj}, this time on the quantity 

F = x' + A(X-Xo), (22) 

to obtain a pair of values (x^(A), X(A)). (The constant term — AXq here is not necessary, 
because it does not affect the minimization; but it makes the minimum value of F easier 
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to interpret.) At this new minimum, X^(A) is the lowest possible for the corresponding 
value X{X) of the physical variable X. Thus one achieves a constrained fit in which is 
minimized for a particular value of X. 

By repeating the minimization for many values of A, one maps out the parametrically- 
defined curve (x^(A), X(A)). Since A is just the parameter for this curve, its value is of no 
particular physical significance. The relevant range for A can be found by trial-and-error; or 
it can be estimated using the Hessian approximation, which predicts that A ~ — 2 Ax^/AX. 
In that approximation, F goes down by the same amount that goes up. 

One way to understand the Lagrange Multiplier method is to imagine that the quantity 
X is simply one of the fitting parameters, say ai. The variation of with oi could be 
mapped out by minimizing with respect to {02, . . . ,a„} for a sequence of values of a^. 
(That operation is indeed so useful that MINUIT provides a procedure MINOS to carry it out.) 
In the more general case that X is a function of all {ctj}, one wants to similarly minimize 
X^ for fixed values of X. That is exactly what the Lagrange Multiplier method does, since 
including the undetermined multiplier term in (|22| ) renders the {aj} independent in spite of 
the constraint on X. 

A more phenomenological way to understand the Lagrange Multiplier method is to 
imagine that X has just been measured, with result X^^w =t o'new To decide whether this 
hypothetical new measurement is consistent with the old body of data, one would add a term 
[(Xncw — -^)/o"new]^ to Xgiobai ^q. (P and rcdo the minimization. The added contribution 
to x^ consists of a constant, a linear term in X, and a quadratic term in X. This is equivalent 
to Eq. (p2D, because a constraint on X^ is equivalent to a constraint on X itself. 

The essential feature of the Lagrange Multiplier method is that, for a given A^^, it finds 
the largest range of X allowed by the global data set and the theoretical model, independent 
of any approximations. The full parameter space {oj} is explored in the minimization pro- 
cedure, not just the immediate neighborhood of the original x^ minimum as in the Hessian 
method, and no approximations based on a small deviation from the original minimum are 
needed. 

The only drawback to the Lagrange Multiplier method is that it can be slow computa- 
tionally, since it requires a separate series of minimizations for each observable X that is of 
interest. 

Example 

We now look at an example of the Lagrange Multiplier method from our application, the 
uncertainty of parton distribution functions. For the physical quantity X, we consider the 
cross section aw for production in pp collisions at the energy ^ = 1.8 TeV of the 
Tevatron collider at Fermilab. We want to estimate aw and the uncertainty on that estimate, 
based on the global analysis of parton distributions. 

The points in Fig. ^ show Xgiobai a function of awB in nanobarns, where B = 0.106 
is the branching ratio assumed for W ev. These points are obtained by the Lagrange 
Multiplier method using A = 0, ±1000, ±2000, ±3000, ±4000. They are thus discrete cases 
of Xgiobai versus aw, without approximations. 

The smooth curve in Fig. |^ is the parabola given in Eq. ([14D , using the Hessian computed 
by the iterative method and treating aw in the linear approximation. The comparison 
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Figure 5: Minimum as a function of the predicted cross section for W"^ production in 
pp collisions. Parabolic curve is the prediction of the iteratively improved Hessian method. 
Points are from the Lagrange multiplier method. 



between this curve and the discrete points from the Lagrange Multiplier calculation tests 
the quality of the quadratic and linear approximations and the reliability of the iterative 
calculation of Hij. For this application, we conclude that the improved Hessian method 
works very well, since the difference between the points and the curve is small, and indicates 
only a small cubic contribution. If the two results did not agree, the correct result would be 
the one given by the Lagrange Multiplier method. 

To estimate AX, the uncertainty of X consistent with the global analysis of existing 
data, one needs to specify what range of Axgiohni allowed. As discussed earlier, the 
acceptable limit of Xgiobai depends on the nature of the original definition of this fitting 



function, in the context of the specific system under study. For the case of aw, Ref. [|l2 



estimates Axl 



■lobal 



100, which translates into Aaw/o'w ~ ±3% according to Fig. |^. 



5 Conclusion 

We have addressed some computational problems that arise in a global phenomenological 
analysis, in which a complex theory with many parameters confronts a large number of data 
points from diverse experiments. 

The traditional error-matrix analysis is based on a quadratic approximation to the 
function that measures the quality of the fit, in the neighborhood of the minimum that 
defines the best fit. The iterative method proposed in Sec. 3 improves the calculation of 
the Hessian matrix which expresses that quadratic approximation, for a complex system in 
which general-purpose programs may fall short. The inverse of this improved version of the 
Hessian matrix is an improved version of the error matrix. It can be used to estimate the 
uncertainty of predictions using standard error matrix formulas. 
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Our iterative procedure for calculating the Hessian is implemented as an extension 
to the widely-used CERN fortran library routine MINUIT 0. The code is available from 
[http: / / www.pa.msu.edu/l ^pumplin/iterate/, or it can be requested by e-mail from 
pumplin@pa.msu.edu. Included with this code is a test example which demonstrates that 
the iterative method is superior to standard MINUIT even for a function that has no 
numerical noise of the type encountered in Fig. |^. 

The Lagrange Multiplier method proposed in Sec. 4 calculates the uncertainty on a 
given physical observable directly, without going through the error matrix. It thus avoids 
the assumption that the theoretical quantities can be approximated by linear functions of 
the search parameters, which is intrinsic to the Hessian approach. 

For simplicity, we have discussed only the problem of obtaining error estimates on a 
single quantity X. It is straightforward to generalize our methods to find the region allowed 
simultaneously for two or more variables by a given Ax^. For example, in the case of two 
variables X^^^ and X^'^\ the allowed region according to the Hessian method is the interior 
of an ellipse. The Lagrange multiplier method can be generalized for this case by adding two 
terms, AiX^^) + AgX^^), to x^- 

Although the Lagrange Multiplier procedure is conceptually simple and straightforward 
to implement, it is slow computationally because it requires many full minimizations to map 
out a function of X, and this must be done separately for each quantity X whose 

error limits are of interest. In contrast, once the Hessian has been determined from the 
global analysis, it can be applied to any physical observable. One needs only to compute 
the gradient dX/dai of the observable X and substitute into Eq. or better, to compute 
the gradient Xj = dX/dzi and substitute into Eq. (|13|). For computational efficiency, the 
iteratively calculated Hessian is therefore the method of choice, provided its linear approx- 
imations are sufficiently accurate. Whether or not that is the case can be determined by 
comparing Hessian and Lagrange Multiplier results. We use both methods in a detailed 
study of the uncertainties in the CTEQ5 parton distribution functions 0, |ll|, that is 
based on the work presented here. 
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